Model Selection

High-precision Reward Model

# High-precision Reward Model

Skywork Reward Llama 3.1 8B V0.2

An advanced reward model built on the Llama-3.1-8B-Instruct architecture, trained with 80K high-quality preference pairs, excelling in handling preference issues in complex scenarios.

Large Language Model

Ppo LunarLander V2

This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve the landing task in the LunarLander-v2 environment.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase